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POSTERIOR PROPRIETY IN BAYESIAN EXTREME VALUE 
ANALYSES USING REFERENCE PRIORS 

Paul J. Northrop[] and Nicolas Attalides 

University College London 


Abstract: The Generalized Pareto (GP) and Generalized extreme value (GEV) distributions play an 
important role in extreme value analyses, as models for threshold excesses and block maxima respectively. 
For each of these distributions we consider Bayesian inference using “reference” prior distributions (in the 
general sense of priors constructed using formal rules) for the model parameters, specifically a Jeffreys prior, 
the maximal data information (MDI) prior and independent uniform priors on separate model parameters. 
We investigate the important issue of whether these improper priors lead to proper posterior distributions. 
We show that, in the GP and GEV cases, the MDI prior, unless modified, never yields a proper posterior 
and that in the GEV case this also applies to the Jeffreys prior. We also show that a sample size of three 
(four) is sufficient for independent uniform priors to yield a proper posterior distribution in the GP (GEV) 
case. 

Key words and phrases: Extreme value theory, generalized extreme value distribution, generalized Pareto 
distribution, posterior propriety, reference prior. 


1. Introduction 


Extreme value theory provides an asymptotic justification for particular families of models for 
extreme data. Let Xi, X 2 , ■ ■ ■ Xpf be a sequence of independent and identically distributed random 


variables. Let uj\f be a threshold, increasing with N. Pickands (1975) showed that if there is a non¬ 


degenerate limiting distribution for appropriately linearly rescaled excesses of ujsf then this limit is a 
Generalized Pareto (GP) distribution. In practice, a suitably high threshold u is chosen empirically. 
Given that there is an exceedance of u, the excess Z = X — uis modelled by a GP (ct„, distribution, 
with threshold-dependent scale parameter au, shape parameter ^ and distribution function 


Fcpiz) = 




1 - exp(-2;/c7„). 


e = o, 


( 1 . 1 ) 


where z > 0, z+ = max(2;,0), ct„ > 0 and ^ G M. The use of the generalized extreme value (GEV) 


distribution (Jenkinson, 1955), with distribution function 


FcEviv) = 


exp|-[l-b^(y-/i)/cT]+^/^| , C/ 0 , 


exp{-exp[-(y - n)/a]} , 


e = o, 


( 1 . 2 ) 


where cr > 0 and ^ G M, as a model for block maxima is motivated by considering the behaviour 
of y = max{Ai,... ,Xi,} as 6 —>• 00 (Eisher and Tippett, 1928, Leadbetter et ah, 1983). 


Gommonly-used frequentist methods of inference for extreme value distributions are maximum 
likelihood estimation (MLE) and probability-weighted moments (PWM). However, conditions on 
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^ are required for the asymptotic theory on which inferences are based to apply: ^ > —1/2 for 


MLE ( 

Smith 

1984 

1985 

) and C < 1/2 for PWM ( 

Hosking et al. 

1985 

Hosking and Wallis, 1987). 

Alternatively, a Bayesian approach (Coles, 2001 

Coles and Powell 

1996 

Stephenson and Tawn 

2004) 


can avoid conditions on the value of ^ and performs predictive inference about future observations 
naturally and conveniently using Markov chain Monte Carlo (MCMC) output. A distinction can 
be made between subjective analyses, in which the prior distribution supplies information from an 


expert (Coles and Tawn, 1996) or more general experience of the quantity under study (Martins 


and Stedinger, 2000, 2001), and so-called objective analyses (Berger 2006). In the latter, a prior is 


constructed using a formal rule, for use when no subjective information is to be incorporated into 


the analysis. There is disagreement about appropriate terminology for such priors: we follow Kass 


and Wasserman (1996) in using the term reference prior. 


Many such formal rules have been proposed: Kass and Wasserman (1996) provides a compre¬ 
hensive review. In this paper we consider three priors that have been used in extreme value analyses: 
the Jeffreys prior (Eugenia Castellanos and Cabras, 2007, Beirlant et al.| 2004), the maximal data 
information (MDI) prior ( jBeirlant et al. 2004), and the uniform prior (Pickands, 1994). These priors 
are improper, that is, they do not integrate to a finite number and therefore do not correspond to a 
proper probability distribution. An improper prior can lead to an improper posterior, which is clearly 
undesirable. There is no general theory providing simple conditions under which an improper prior 


yields a proper posterior for a particular model, so this must be investigated case-by-case. Eugenia 


Castellanos and Cabras (2007) establish that Jeffreys prior for the GP distribution always yields a 


proper posterior, but no such results exist for the other improper priors we consider. It is impor¬ 
tant that posterior propriety is established because impropriety may not create obvious numerical 


problems, for example, MCMC output may appear perfectly reasonable (Robert and Casella, 1996). 
One way to ensure posterior propriety is to use a diffuse proper prior, such as a normal prior 


with a large variance 

1 Coles and Tawn 

2005 

Smith 

2005 

) or by truncating an improper prior 

(Smith and Goodman 

2000 

). For example. 

Coles 

(200 

L, chapter 9) uses a GEV(/r, u, model for 


annual maximum sea-levels, placing independent normal priors on p,, log a and ^ with respective 
variances 10^, 10^ and 100. However, one needs to check that the posterior is not sensitive to the 


choice of proper prior and, as Bayarri and Berger (2004) note “... these posteriors will essentially be 
meaningless if the limiting improper objective prior would have resulted in an improper posterior 
distribution.” Therefore, independent uniform priors on separate model parameters are of interest 
in their own right and as the limiting case of independent diffuse normal priors. 


In section 2 we give the general form of the three priors we consider in this paper. In section 
3 we investigate whether or not these priors yield a proper posterior distribution given a random 
sample 2 ; = {zi,..., Zm) from the GP distribution, and, in cases where propriety is possible, we 
derive sufficient conditions for this to occur. We repeat this for a random sample y = {yi,... ,yn) 
from a GEV distribution in section 4. In section 5 we discuss some implications of these results and 
possible extensions. Proofs of results are presented in the appendix. 
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2. Reference priors for extreme value distributions 


Let y is a random variable with density function f{Y \ cp), indexed by a parameter vector 4>, 
and define the Fisher information matrix by I{(j))ij = E [—d^ln/(y | . 


Uniform priors. Priors that are flat, i.e. equal to a positive constant, suffer from the problem 
that they are not automatically invariant to reparameterisation: for example, if we give logcr a 
uniform distributon then a is not uniform. Thus, it matters which particular parameterization is 
used to define the prior. 


Jeffreys priors. Jeffreys’ “general rule” (Jeffreys, 1961) is 


TTjif)) oc det(/((/)))^/^. 


( 2 . 1 ) 


An attractive property of this rule is that it produces a prior that is invariant to reparameterization. 
Jeffreys suggested a modification of this rule for use in location-scale problems. We will follow this 


modification, which is summarised on page 1345 of Kass and Wasserman (1996). If there is no 


location parameter then (2.1) is used. If there is a location parameter /r, say, then (j) = {p.,9) and 

7rj(/x, 9) oc det(/(6»))^/^ (2.2) 

where I(9) is calculated holding p. fixed. In the current context the GP distribution does not have 
a location parameter whereas the GEV distribution does. 


MDI prior. The MDI prior (Zellner, 1971) is defined as 


'KM{<i>) OC exp{E[log/(y I 0)]}. 


(2.3) 


This is the prior for which the increase in average information, provided by the data via the likelihood 


function, is maximised. For further information see Zellner (1998). 

3. Generalized Pareto (GP) distribution 


Without loss of generality we take the m threshold excesses to be ordered: zi < ■ ■ ■ < Zm- For 
simplicity we denote the GP scale parameter by a rather than au. We consider a class of priors of 
the form 7r(c7, ^) oc 7r(^)/cj, cr > 0,^ G M, where 7r(^) is a function depending only on that is, a 
priori a and f are independent and log a has an improper uniform prior over the real line. 


The posterior is given by 


\ z) = Cj^Trif,) a (1 + ^^i/cr) , a>0,^>-a/zm 

i=l 


where 

/ OO 1*00 

/ 7r(0 (1 + dcj d^ (3.1) 

-OO ./ max(0,— 

and the inequality ^ > —afzm comes from the constraints 1 + f,Zi/a > 0,i = l,...,min the 
likelihood. 
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3.1 Prior densities 


Using (2.1) with cj) = {cr,^) gives the Jeffreys prior 

1 


Eugenia Castellanos and Cabras (2007| show that a proper posterior density results for m ^ 1. 


Using (2.3) gives the MDI prior 


gp(c’', oc — e oc— e^ (T>0, 
’ a a 


(3.2) 


Beirlant et al. (2004, page 447) use this prior but they do not investigate the propriety of the 


posterior. 

Placing independent uniform priors on logfi and ^ gives the prior 

7rf/,Gp(o-,0 oca>0, . 

a 


(3.3) 


This prior was proposed by Pickands (1994). 


Figure shows the Jeffreys and MDI priors for GP parameters as a functions of The MDI 
prior increases without limit as ^ — oo. 


w 

c 

CD 


Q. 

T3 

_g) 

CD 

o 

c/3 



Figure 1: Scaled Jeffreys and MDI GP prior densities against C 


3.2 Results 

Theorem 1. A sufficient condition for the prior Tr{a,f^) oc T:{f)/a,a > 0,.^ E M to yield a proper 
posterior density function is that 7r(^) is (proportional to) a proper density function. 


The MDI prior (3.2) does not satisfy the condition in theorembecause exp{—(^ + 1)} is not 
a proper density function on ^ E M. 
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Theorem 2. There is no sample size for which the MDI prior (3.2) yields a proper posterior density 
function. 


The problem with the MDI prior is due to its behaviour for negative ^ so a simple solution is to 
place a lower bound on ^ a priori. This approach is common in extreme value analyses, for example, 

constrain ^ to (—1/2,1/2) a priori. We suggest 

7rM,Gp(o',0 = ^ ^-1, (3.4) 

that is, a (proper) unit exponential prior on ^ + 1. Any finite lower bound on ^ ensures propriety 
of the posterior but ^ = — 1 , for which the GP distribution reduces to a uniform distribution on 
( 0 , cj), seems less arbitrary than other choices as it corresponds to a change in the behaviour of the 
GP density. For ^ > — 1, the GP density fcpi^) decreases in 2 :, which is what one anticipates when 
conducting an extreme value analysis to make inferences about future large, rare values. For ^ < —1, 
fcpiz) increases without limit as it approaches its mode at the upper end point —cr/^, behaviour 
that is not expected in such analyses. 


Martins and Stedinger (2001 


Corollary to theorem 

for m ^ 1. 


The truncated MDI prior (3.4) yields a proper posterior density function 


Theorem 3. A sufficient condition for the uniform prior (3.3) to yield a proper posterior density 
function is that m ^ 3. 


4. Generalized extreme value (GEV) distribution 

Without loss of generality we take the n block maxima to be ordered: yi < ■ ■ ■ < y-n- We 
consider a class of priors of the form 7 r(|U, ct, .^) oc 7 r(,f)/cr, a > 0, y,!, G M that is, a priori y,, a and f 
are independent and y and log a have improper uniform priors over the real line. 

Based on a random sample yi,... ,yn the posterior density for {y, (T,f) is proportional to 

{ n ^ 

( 4 . 1 ) 

i=l ) i=l 


where Zi = 1 + f,{yi — y)/a and cr > 0. If ^ > 0 then y — a/f, < yi and if .^ < 0 then y — a> yn- 

4.1 Prior densities 

Kotz and Nadarajah (|2000 page 63) give the Fisher information matrix for the GEV distribution 


(1.2). Using (2.2) with (f = {y,a,f) gives the Jeffreys prior 


[1 “ 2r(2 + ^)+p] 






1 2 


TT 

6 

1/2 


, y GR,a > 0, ^ > -1/2, 


(4.2) 


where p = (1 + ^)^F(1 + 2^), q = F(2 + ^) {i/(l + ^) + (1 + ^)/^}, ^{r) = dlogr{r)/dr and 7 ss 
0.57722 is Euler’s constant, [van Noortwijk et al. ( |2004 ) give an alternative form for the Jeffreys 
prior, based on ( 2 . 1 ). 
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Beirlant et al. (2004, page 435) give the form of the MDI prior: 


= - e T'(«+i+V7) oc -e a > 0, eR. 

a a 

Placing independent uniform priors on yu, logo" and ^ gives the prior 

Eu,GEvif^,o-,C) oca>0,fi,^eR. 
a 


(4.3) 


(4.4) 


Figure shows the Jeffreys and MDI priors for GEV parameters as a functions of The MDI prior 
increases without limit as ^ —oo and the Jeffreys prior increases without limit as ^ oo and as 

a-1/2. 


w 

c 

CD 


Q. 

T3 

_g) 

CD 

o 

c/3 



Figure 2: Scaled Jeffreys and MDI GEV prior densities against G 


4.2 Results 

Theorem 4. For the prior Tr{p,a,^) oc 7r{^)/a,a > 0, G R to yield a proper posterior density 
function it is neeessary that n ^ 2 and, in that event, it is sufficient that 7r(^) is (proportional to) a 
proper density function. 


Theorem 5. There is no sample size for which the Jeffreys prior (4.2) yields a proper posterior 
density funetion. 

Truncation of the independence Jeffreys prior to ^ ^ would yield a proper posterior density 
function if n ^ 2. In this event theoremrequires only that '^(0 is finite, where here 7r(^) = 

(^Ej^GEvih: (see (4.2)). From the proof of theorem [^we have 7r(^) < 2 + (1 — 7 )^] (1 + 

2 ^)-i /2 fQp ^ g (—1/2, —1/2 + e), where e > 0. Therefore, 


/ —. /*—1/2+e 

7r(0d^ < 2 [ttVg + (1 - 7 ) V / (l + 20 -'/'de, 

-1/2 J-1/2 

= 2^/2 [vrV6 + (l-7)2] 
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The integral over (—1/2 + e, C+) is also finite. However, the choice of an a priori upper limit for ^ 
may be less obvious than the choice of a lower limit. 


Theorem 6. There is no sample size for which the MDI prior (f.S) yields a proper posterior density 
function. 


As in the GP case, truncating the MDI prior to ^ ^ —1, that is, 

cr, 0 oc - ^ G M, cr > 0, I ^ -1, 

a 

is one way to yield a proper posterior distribution. 


(4.5) 


Corollary to theorem]^ The truncated MDI prior {4-5) yields a proper posterior density function 
for 2. 


Theorem 7. A sufficient condition for the uniform prior {4-4) to yield a proper posterior density 
function is that n ^ 4. 

5. Discussion 

We have shown that some of the reference priors used, or proposed for use, in extreme value 
modelling do not yield a proper posterior distribution unless we are willing to truncate the possible 


values of ^ priori. An interesting aspect of our findings is that the Jeffreys prior (4.2) for GEV 


parameters fails to yield a proper posterior, whereas the uniform prior (4.4) requires only weak 


conditions to ensure posterior propriety. This is the opposite of more general experience, summarised 


by (Berger, 2006, page 393) and (Yang and Berger, 1998, page 5), that Jeffreys prior almost always 
yields a proper posterior whereas a uniform prior often fails to do so. The impropriety of the posterior 
under the Jeffreys prior is due to the high rate at which the component 7r(.^) of this prior increases 


for large An alternative prior based on Jeffreys’ general rule (2.1) (van Noortwijk et ah, 2004) 
also has this property. 


The conditions sufficient for posterior propriety under the uniform priors (3.3) and (4.4) are 


weak. Therefore, a posterior yielded by a diffuse normal priors is meaningful but such a prior could 
be replaced by an improper uniform prior. Although it is reassuring to know that a posterior is 
proper, with a sufficiently informative sample posterior impropriety might not present a practical 
problem (Kass and Wasserman, 1996| section 5.2). This may explain why (Beirlant et ah, 2004 


pages 435 and 447) obtain sensible results using (untruncated) MDI priors. However, the posterior 
impropriety may be evident for smaller sample sizes. 

In making inferences about high quantiles of the marginal distribution of Y, the GP model 
for threshold excesses is combined with a binomial(Y,p„) model for the number of excesses, where 
Pu = P{X > u). Reference priors for a binomial probability have been studied extensively, see, for 


example, Tuyl et ah (2009). An approximately equivalent approach is the non-homogeneous Poisson 


process (NHPP) model (Smith, 1989), which is parameterized in terms of GEV parameters and 


^ relating to the distribution of max{Y^..., Y;,}. Suppose that m observations xi,..., Xm exceed u. 
Under the NHPP the posterior density for is proportional to 


a 


-(rn+i)^(^)expi-n 


1+e 


u—p, 

a 


-i/V 


n 

i=l 


1 +^ 


Xj-p 

a 


-( 1 + 1 /?) 


(5.1) 
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where n is the (notional) number of blocks into which the data are divided in defining (/r, cj, ^). 
Without loss of generality, we take n = m. The exponential term in (5.1) is an increasing function 
of u, and Xi > u,i = 1,... ,m. Therefore, 


exp < —n 


1 + e 


u — 
a 


- 1 /?' 


< exp ^ 
I i=i 


1 + C 


Xi- ^ 
a 


-i/r 


and (5.1) is less than 

( m 

r-(”"+i)7r(0expi-^ 


a 


2=1 ■- 


i+c 


Xi-^JL 

a 


m 

n 


2 = 1 ^ 


1+^ 


Xi-^JL 

a 


-(i+i/C) 


(5.2) 


Equation 


is of the same form as (4.1), with n = m and y* = Xi,i = l,...,re. Therefore, 


theorems and apply to the NHPP model, that is, for posterior propriety it is sufficient that 
either (a) n ^ 2 and 7r(/i, cr,^) oc 7r(^)/(T, for a > 0,/i,^ G M, where d^ is finite, or (b) n ^ 4 

and 7r(/i, (T, ^) oc l/cr, for a > 0,/r, ^ G M. 

One possible extension of our work is to regression modelling using extreme value response 


distributions. For example, Roy and Dey (2014) use GEV regression modelling to analyze reliability 
data. They prove posterior propriety under conditions on the prior for (cr, C) that are stronger than 
those in our theorems]^ andFuture work will investigate our conjecture that the conditions in Roy 


and Dey (|2014[) can be weakened. Another extension is to explore other formal rules for constructing 


priors, such as reference priors (Berger et ah, 2009) and probability matching priors (Datta et al. 


[2009 ). Ho ( |2010 ) considers the latter for the GP distribution. 
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6. Appendix 

6.1 Moments of a GP distribution 

We give some moments of the GP distribution for later use. Suppose that Z ~ GP{a,^), where 
^ < 1/r. Then ( ]Giles and Feng 2009) 

r! a'" 


E{Z^) = 




r = 1,2,.. 


( 6 . 1 ) 


2=1 


Now suppose that ^ < 0. Then, for a constant a > ^, and using the substitution x = we 

have 


E(Z-“/«) = ^ 




a 


a;-«/C(l _a;)-(l + l/?) dx. 


. ,.a/£-l -a/g r(i-a/or(-i/o 
^ r(l-(a + l)/0 ’ 


( 6 . 2 ) 


where we have used integral number 1 in section 3.251 on page 324 of Gradshteyn and Ryzhik (2007), 
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namely 


1 , 


^-^1 - dx = ^Beta (p = 


rWA)r(u) 

r(/i/A + u) 


A > 0, u > 0, ;U > 0, 


Jo A 

with X = 1, fj. = 1 — a/^ and v = —1/^. 

In the following proofs we use the generic notation 7r(^) for the component of the prior relating 
to the form of 7r(^) varies depending on the prior being considered. 

6.2 Proof of theorem and its corollary 


This trivial extension of the proof of theorem 1 in Eugenia Castellanos and Cabras (2007). 
Suppose m = 1, with an observation 2;. The normalizing constant C of the posterior distribution is 
given by 


Cl = 


/ oo poo poo 

dad.^ + / 7r((^) / dud^, 

-^z Jo Jo 


1 

2; 


7r(0 


If the latter integral is finite, that is, 7r(^) is proportional to a proper density function, then the 
posterior distribution is proper for m = 1 and therefore, by successive iterations of Bayes’ theorem, 
it is proper for m ^ 1. 

The corollary follows directly. □ 

6.3 Proof of theorem [2] 

Let A{^) = e“^ and Ili^i (1 + Then, from (3.1) we have 

/ OO 

^(0 

-OO 


Cm. — 


’ —OO 
r--! 


/ma^{0,-^Zm) 


B{a,i) dcrd^. 


/ —i poo p\J poo poo poo 

Al(0 / B{a,i) dud^T / Al(0 / B{a,i) dud^ + / A{i) / S(ct, 0 dad^ 

The latter two integrals converge for m ^ 1. However, the first integral diverges for all samples 
sizes. For ^ < —1, (1 + ^2 ;/cj)“B+i/?) > 1 when 2; is in the support (0, —cr/^) of the GP(cr, density. 
Therefore B{a,^) > Thus, the hrst integral above satisfies 


r —1 


/ OO r~^ 

B{a,0dadC > / A{0 da d^, 

J —CXD J — 


— OO 
-1 


A{0 


' —OO 
r-1 


-(J 

m 


de, 


— 


[ AiO-[-^z^]-^ d^, 

7-00 m 

1 

- / y-^e^ du, 

m2;™ Ji 


where v = This integral is divergent for all m ^ 1, so there is no sample size for which the 
posterior is proper. □ 
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6.4 Proof of theorem [S] 

We need to show that 6*3 is hnite. We split the range of integration over ^ so that C3 = /1+/2+/3, 
where 

/ —I POO pO poo poo poo 

/ ^(o-,0d<Td^, h= / dfid^, h= / -B(cT,OdcjdC 

-00 JJ— 1 JJo Jo 

and = cr“^ 0^=1 ■ For convenience we let p = ^/a. 

Proof that Ii is finite. We have ^ < —1 and so —(1 + 1/^) < 0, p < 0 and 0 < 1 + pzi < 1 for 
i = 1, 2, 3. Noting that —pz^ < 1 gives 

{I + pzi){l + pz 2 ){l + pz^) > {-pz^i^ pzi){-pz‘i +pZ 2 ){l +pZ'i), 

= {-pf{z^-Zl){Z‘i-Z2){l+pZ:i), 

= {-ifcr~‘^{z2,-zi){z:i-Z2){l + pzz). (6.3) 


Therefore, 

3 


2=1 




a 




<(-0 


-2(l+l/?)o-2(l+l/?) 




a 


Thus, 


h ^ 


[ [(^3 - ^2)(^3 - 

J —00 


where 


ha = 


^-4^2(l+l/0 ( 1 + 


'-C^3 


a 


-(i+i/e) 


= z. 


-1 








7 =T 1 + -=T du, 


dfj, 
-(i+i/«) 


= i_f'|2/?-i.-(i-2/?) r(i - 2/i)T{-i/h) 
I 3 r(i- 3 /o 


where v = 1/a and the last line follows from (6.2) with a = 2 and a = z.^^. Therefore 


h ^ 


j j-ir" [(‘-3 - 32)(^3 - ^-(i- 2 /o r(i ^^ 2 /ory/o 


f-1 


= [^3 (^3 - 5^2) (2:3 - 2^1)] 


= [2:3(2;3 - 2;2)(2;3 - 2;i)] 


-1 


-1 


(-6 


-3 


' —00 

rl 


2:3 




1 - — 
2^3 


X I 1- 

Z3 


fi _ r(i + 2x)r(x) 


= IZ3(Z3-Z2)(Z3-Z1)] ^ (^“^ 3 ) (^”^ 3 ) 


Z3 J r(i + 3 x) 
zi Y r(i + 2x)r(i + x) 


r(i + 3 x) 


r(i- 3 /o 

dx, 
dx, 


d?, 


(6.4) 


where x = —1/^ and we have used the relation r(l + x) = xr(x). The integrand in (6.4) is finite 
over the range of integration so this integral is finite and therefore h is finite. 

Proof that h is finite. We have — 1 < ^ < 0, so —(1 + 1/^) > 0 and (1 + C^/u) d+^/5) < 1 and 
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decreases in z over (0,—cr/^). Therefore, 


h = 




fO /-oo 3 

a-" 

l-lJ-^Z3 
r*0 /*oo 


/:/ 

/:/ 


n (1+v) 


-1 J-iZ3 


.-(1 + ^ 

a 


-(i+V?) 


.-1 / „.2 ^ 


= 


/-I JO 

/■o 0.-2 

.-1 / ^^3 




dcj d^, 


-(i+i/«) 


/_! ( 1 - 0 ( 1-20 

-1 


— 1 + — du dO 


dO 


= 2 z. 


-3 


'-1 


^-0 -(i-O'^Me, 


= 2z3-On(3/2), 


where the integral over v follows from (6.1) with r = 2 and a = z-_ 


-1 


Proof that Is is finite. We have ^ > 0 so —(1 + 1/0 < 0. Let gn = (OILi^O^^”- Mitrinovic 
(1964, page 130): 


J^(l + Ofc) ^ (1 + b)^, Ofc > 0; Ofc — 6”', 


(6.5) 


k=l 

with ttk = ^Zk/cr and b = ^gsja gives 

3 / .X -(l+l/C) 


2=1 


n 1+ 


a 


^ ( 1 + — 
a 


k=l 


-3(l+l/0 


and therefore 


roo poo 3 

^3 = / / ^'"11 

Jo Jo 

^ r/ 

Jo Jo 


1 + hi 

a 


-(i+i/«) 


C93\ 




-3(1+1/?) 


> V + y 


/•OO /*CO -I 

L 


dcj dO 

dfj do 
df do 


where v = l/cr, a = 1/(2 + 3/0 and /3 = a/^g^ = 1/(3 + 2^)g^. For ^ > 0, a < 1/2 so using (6.1) 
with r = 2, cj = 0 and ^ = a gives 


h ^ 


r 


u 


2/32 


^ .-3 / 


(l-a)(l-2a) 

1 


dO 


3^^ ^0 (e + 3)(2^ + 3) 


dO 


2 -3 r / 1 


“ 9^3 

y JO 
2 

= 953 In 2 . 


1 


^ + 3/2 e + 3 


dO 


The normalizing constant Cs is finite, so yields a proper posterior density for m = 3 and 

therefore does so for m ^ 3. □ 
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6.5 Proof of theorem and its corollary 

Throughout the following proofs we define 5i = yi — yi,i = 

We make the parameter transformation cj) = yt — cr/^. Then the posterior density for i 

:U bv 


given by 

where 


Kn = 


Gn(0,o-) = W^ I exp I -1^1 a^/^Y2\yi- (p\ 

, if ^ > 0 then 4> < yi and if ^ < 0 then (j) > yn- 

We let b = Ym=i \yi ~ v = . The normalizing constant Kn is given by 

poo 

a) da d(/> d^, 


roo /• poo 



ex 


|—dcr d</> d^, 
:p{—6u} 1^1 du dfj) d^, 


= J 7 r( 0 k| |ni2/i-0l 

/ oo p ( ^ I poo 

^ »r({)ijr”<‘+‘'«> / jn i!" - 1 1 

/ OO r \ ^ I 

^(^)|^|-(i+i/0 J m|y,-0|-(i+VO|r(n)5-|e| delude, 

poo p ( ^ ^ ^ ^ ^ 

= J y |ni2/*-0r('+'/^4(n-i)!iir/«+N^iyi-0|-'/^ d</.de, 


roo rf” If” 1” 

{n-iy.J 7r(0|?|^"”y d(/)d^, 


( 6 . 6 ) 


For n = 1 the integral ~ ^ dep is divergent so if n = 1 the posterior is not 

proper for any prior in this class. 

Now we take n = 2 and for clarity consider the cases ^ > 0 and ^ < 0 separately, with respective 


contributions and K 2 to K 2 . For C > 0) using the substitution u = {yi — (p) ^ in (6.6) gives 


Kt = 


poo 

/ 

JO 


-1 {yi - (p) - (p) ^ 

/-oo {(yi - + ( 2/2 - (/>)-V«}^ 


Jo Jo {l + (l + 52n)-V«}2 

1 r°° 

= 2 '^ 2 -'y 

the final step following because the u-integrand is a multiple of a shifted log-logistic density 

function with location, scale and shape parameters of and ^ respectively, and the location 

of this distribution equals the median. For ^ < 0 an analogous calculation using the substitution 


V = iUn- (p) ^ in (6.6) gives 


^2 = 2'^2 


-1 


7r(0 d^. 
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Therefore, 


1 

K2 = K+ + K^ = 7r(0 


de. 


Thus, K 2 is finite if 7 r(^) is finite, and the result follows. 

The corollary follows directly. 

6.6 Proof of theorem [ 5 ] 

The crucial aspects are the rates at which 7 r(^) —>00 as ^ 4 ,—1/2 and as 00 . 


□ 


The component 7 r(^) of (4.2) involving ^ can be expressed as 


where 


Ti = 




TT 


7^1(0 = ^,iTl+T2), 


+ ( 1 - 7 )' (i + 0"r(i + 20 , 


Ta = — + 

6 


2(l-7)(7 + V’(l + 0)-y 


r (2 + 0 , 


-[i + i/(i + 0 ]'[r (2 + 0 ]^ 


(6.7) 


( 6 . 8 ) 


(6.9) 


Firstly, we derive a lower bound for 7 r(^) that holds for ^ > 3. Using the duplication formula 


(Abramowitz and Stegun, 1972, page 256; 6.1.18) 

T{2z) = ( 27 r )“^/2 ^ 1 / 2 ), 


with z = l /2 + ^in ( 6 . 8 ) we have 

^2 


Ti = 


TT 


1- + (1 - 7)2 (1 + ^ 2 7^-1/222?r(i /2 + r(i + 0. 

6 


We note that 


r(i/2 + 0 = + + = ^r(i + 0 ^ r(i + 0 


1/2 + e ' 1/2 + ^ 1 + 2 ^ ' 1 + ^ ’ 

where for the first inequality to hold it is sufficient that ^ > 1/2; and that, for ^ > 3, 22? > (1 + ^)^. 
Therefore, 


Ti > 


IT 




( 6 . 10 ) 


Completing the square in (6.9) gives 


where 


Ta = -{[l + V^(l + e)]r(2 + 0 + /(0} +[/(0]' + ^V6, 


^ 7r2/6-(l-7)(7 + V^(l + 0) ^ 7rV6 + (1 -7)2 _ 


1 + 1 /( 1 + 6 i + V'(i + 0 

and [/(O]^ + ^^/6 > 0. 

For ^ > 0, 'ip{l + ^) increases with ^ and so f{^) decreases with Therefore, for ^ > 3, 
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/(O < /(3) ~ 0.39 and 


T2 > -{[i + iA(i + 0]r(2 + o + /(3)r. 


For > 0, we have '0(1 + 0 < 1^(1 + 0 “ (1 + (Qiu and Vuorinen, 2004, theorem C) and 

ln(l + ^ ^ (Abramowitz and Stegun, 1972, page 68; 4.1.33). Therefore, noting that r(2 + ^) = 

(1 + r(l + we have 


T2 > -|(i + e) 2 r(i + 0-2r(i + e) + /( 3 ) 

For ^ > 3, /(3) -F(l +^)/2 < 0 so 

7^2 >-(1 + 0" [F(l + 0]'- 

Substituting (6.10) and (6.11) in ( |6.7[ ) gives, for ^ > 3, 

(1 + 0" r Tt' 


vr^OO > 


0 

> c[F(l + 0]^ 

> c(l + 0^^^^“^^ 


2 

y + (l-7) 


7r-"/2_i|[F(l + 0]' 


( 6 . 11 ) 


where c = (4/3)^{[7r^/6 + (1 — 7)^]7r“^/^ — 1} ps 0.0913 and the final step uses the inequality 
F(x) > for X > 0 (Alzer, jl9^ ), where A = ("t^/O — 7)/2 « 0.534. Thus, a lower bound 

for the ^ component of the Jeffreys prior (4.2) is given by 


-^-(0 > c"/^(l + 0'^^ for ^ > 3. 


( 6 . 12 ) 


[In fact, numerical work shows that this lower bound holds for ^ > —1/2.] 

Let denote the contribution to for ^ >3. Using the substitution u 
gives 


(yi -0) in (6.6) 


a:+ = (n- 


poo pc 

(n-1)! / 7r(00-"/ 
43 Jo 


J|(l + 6iu) 


-{i+i/U 


,n-2 *=1 


du do 


(6.13) 


1 + ^^(1 + 


u 


i-i/« 


i=2 


For ^ > 0 we have 1 + ^(1 + Siu) 0 n and nr=i(l + ^ (1 + ^nu) 


i=2 


Applying these inequalities to (6.13) gives 


n ^{n — 


poo 

1)! vr(0e' 


/ ^^-2(1 + du dO 


OO -I 

= n-"(n-l)! / 7r(OC"“'^0 / n"-2-(l + 


au 


0 V /3 


-(l+l/a) 


du dO 


(6.14) 


where 13 = aj5n and a = [n —2 + (n —l)/0 ^ and 0 < a < (n —2) The u-integrand is the density 
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function of a GP(/3, a) distribution and so, using (6.1) with r = n — 2, the integral over u is given by 

n—2 ^ n—2 


(n - 2)!r-2 H = (n - 2)1 H 

1=1 1=1 


1 


(n — 2 — + n — 1 


(6.15) 


Substituting (6.15) into (6.14) gives 


> n "'(n — 


r 

l)\{n-2)\5i-- 


1 


n—2 

n 


1 


= n 


poo ^ ^ 

'^3 .•_ r\ 


3 (n — 2)^ + n — 1 (n — 2 — i)^ + n — 1 
1 


7r(0 


3 -J-Q {n-2-i)( + n-l 




= n-’"(n-l)!(n-2)!5^-"(n-l)^- 

poo ^ 


1 


) n—2 

n 1 _L _j^c 

1=0 ^ ^ n-lS 


7r(0 


> C(n) 


(1 + 0 




where C(n) = n "■(n — l)!(n — 2)! <5^ "-(n —1)^ Applying (6.12) gives 


A:+ > C(n)0/2y (1 + ,c)2-»^+^«-7 


For any sample size n the integrand —>■ oo as ^ oo. Therefore, the integral diverges and the result 
follows. □ 


Now we derive an upper bound for 7r^(0 that applies for ^ close to —1/2. We note that for 
T/2 < ^ < 0 we have r(l + 2^) = r (2 + 2^)/(l + 2,^) < (1 + 2^)“^. From (6.7) we have 


vr|(0 = 


vr 


y + (l-7)^ 


1 + e 


F(i + 2e) + p, 


where T2 —)■ —3.039 as ^ | “1/2- Noting that (1 + 0^/0 —>• 4 as ^ —1/2 shows that 7r(^) < 

2 ["TT^/h + (1 — 7)^] (1 + 2^)“^+ for ^ £ (—1/2, —1/2 + e), for some e > 0. In fact numerical work 

shows that e ~ 1.29. 


6.7 Proof of theorem [6] 

We show that the integral K~, giving the contribution to the normalising constant from ^ < — 1, 
diverges. From the proof of theorem]^ we have 

foo (n 1 r W" 

= (^-l)!/ e“+^+5) I d^d^. 

U=i J U=i J 


For ^ < —1 we have —(1 + 1/^) < 0 and —1/^ > 0. Therefore, for i = 2,..., n, {4> — yi) > {(j) — 

and < {<j> — yi)~^^^■, and thus the (/)-integrand is greater than re“”'((/— 
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Therefore, 


K- > (n - 1)! / (-O'"” / n-^{ct> - yi)"" d</. d?, 

J — OO Vn 


r—1 


= (n - ly.n-^in - ly^y^ - yi)'"" / (-0'"" d^, 

/ —OO 


/ OO 

dx, 


where x = —For all samples sizes n this integral diverges so the result follows. 

6.8 Proof of theorem [7] 


□ 


We need to show that 7^4 is finite. We split the range of integration over ^ in (6.6) so that 
K 4 = Ji + J 2 + J 3 , with respective contributions from ,f<— 1 ,—and .^ > 0. 

Proof that Ji is finite. We use the substitution u = {4> — yi)~^ in (6.6) to give 


Ji = 3! 


_ „A-(l+l/0 






/ —I /*oo ( ^ 

inw-sA 

-OO Jy 4 , 

■[ (-0"^/ + du d^. 

i =2 [ 1=2 J 


d</> d^, 

^-4 


= 3! 


A similar calculation to (6.3) gives 


-(1+1/0 


]^(l-5iu) ^ u < JJ((54 - 5i) > (I-S 4 U) (1+^/^). 


i =2 


.i=2 


Noting also that 1 + J2f=2(^ ~ ^ 1 we have 


f-i 


-{ 1 + 1/0 


Ji ^ 3! / 


-1/54 


-2/^(1-<5411)"^^+^/^^ du d^, 


.1=2 
' 3 


-{ 1 + 1/0 


= 3! 


= 3! 




f-i 


.1=2 
' 3 


,1/54 , 1 / o,,\-(1+1/0 


du d.^, 


-( 1 + 1/0 






.1=2 


_i r(i-2/or(-i/o 

r(i - 3/0 


de, 


where /3 = —^64 and the last line follows from (6.2) with a = 2 and a = p. 
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Therefore, 

« 3! fj-O-Hy. - fl(y4 - r"/- 3)0 


n-l 


^'■YlivA-yi) ^ / (-6 ^ n 


" y4-y,y^^r(i-2/or(-i/0 


i=l 

3 


\i=2 


Vi - yi 


r(i - 3/0 


dO 


^'■Yliy^-yi) W n 


i=l 

3 


\i=2 


yi-yi \ r(i + 2 x)r(x) 
Vi-yi) r(l + 3a:) 


dx, 


M / 3 


^'■Yiiy^-vi) w n 


yi-yi \ r(i + 2x)r(i + x) 


2=1 


Ll?/4-yi/ r(i + 3x) 


dx, 


\i=2 


where x = —and we have used the relation r(l + x) = xr(x). The integrand in (6.16) 
over the range of integration so this integral is finite and therefore Ji is finite. 

Proof that J 2 is finite. Using the substitution u = {(j) — yi)~^ in ( 6 . 6 ) gives 

J 2 = 3! / (-0“^ / n(l - < 1 + ^(1 - > du dO 

J—1 Jo j_2 I i—o I 


i=2 


For —1 ^ ^ < 0 we have —(1 + 1/0 ^ 0. Noting that 0 < 1 — 6 iU < 1 gives 


]^(1 — diu) d+i/C) ^ (1 — (54^) (1+1/5). 


i=2 


Noting also that 1 + X]i= 2 (-*- “ ^ 1 we have 

rO m&i 


J 2 ^ 3! y (-0"^y '^2(1-(54u)-d+i/5) du dO 


pO p^/^A 1 / t,,\ “(1+1/5) 

^ bbi 

= 12 (y 4 -?/i)“"ln( 3 / 2 ) 


du do 


where /3 = —^/(54 and the penultimate line follows from ( 6 . 2 ) with r = 2 and a = p. 
Proof that J 3 is finite. Using the substitution u = {yi — in ( 6 . 6 ) gives 


h = 3! 


- ^U(i+i/5) 




»-4 


- ^1-1/5 


= 3! 


poo pyi 


/•oo /•oo ^ ^ 

2—2 V 2 — 2 y 


d(j) dO 

\-4 


Noting that for ^ > 0 we have —(1 + 1/0 < Oj using (6.5) with ak = SkU gives 


]^(1 + M-d+i/5) ^ (1 + 

i=2 


(6.16) 
is finite 
















18 


PAUL NORTHROP AND NICOLAS ATTALIDES 


where g = ( 5253 (^ 4 )^/^. Noting also that 1 + Yli= 2 i^ + ^ ^ we have 


J 3 ^ 3! / ^ 

JQ 


-3 


/O 


'(1 + du d^, 


poo p 

^ 3! / r='/3 / 

7o ^0 


00 1 

2 


q;^(\ -(1 + V «) 

1 + — j dn de, 


where a = C/(2^ + 3) and (3 = a/g. Therefore, (6.1) with r = 2, a = f3 and ^ = a gives 

/■oo 2/32 


J 3 ^ 3! / r^/3 


= 45 
4 

= 3 " 


-3 


(1 - a)(l - 2 a) 

1 


d^, 


-3 


'0 (? + 3 )( 2 ^ + 3 ) 

1 1 


d^, 


/o 


^ + 3/2 e + 3 


d^, 


4 3 , 

= -o“^ln 2 . 

3^ 


The normalizing constant K/^ is finite, so Tru,GEv{tJ'i <t, 0 yields a proper posterior density for n = 4 
and therefore does so for n ^ 4. □ 
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